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Lower bounds are derived on the average mean-squared error of optical phase estimation in a 
Bayesian framework using classical rate-distortion theory in conjunction with the classical capacity 
of the lossy and lossless optical channel under phase modulation. With no optical loss, the bound 
displays Heisenberg-limit scaling of the mean-squared error 5& 2 ~ l/iV| where Ns is the average 
number of photons in the probe state. In the presence of nonzero loss, a lower bound with standard- 
quantum-limit (SQL) asymptotic scaling is derived. The bounds themselves are non-asymptotic and 
valid for any prior probability distribution of the phase. 
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I. INTRODUCTION 

It has long been argued that the mean-squared error 
in sensing an optical phase shift can exhibit, at best, 
an inverse quadratic scaling with the mean number of 
photons in the quantum state used to sense the phase 
PQ. Recently, following some claims that this so-called 
"Heisenberg limit" (or H limit) on phase estimation may 
be beaten [2H4], several authors have revived the subject 
by providing rigorous proofs of lower bounds, not limited 
to optical interferometry, with H limit scaling [5H9]. In- 
terestingly, these proofs use diverse techniques, namely, 
the speed limit on quantum evolutions [5 , 8 , the entropic 
uncertainty relations [6, 9j, and the quantum Ziv-Zakai 
bound [TIE]- In this paper, we develop another technique 
for obtaining lower bounds - classical rate-distortion the- 
ory - that was introduced into quantum metrology in 
[TU] , and apply it to both lossless and lossy optical phase 
estimation. 

Rate-distortion theory [11 is a branch of classical in- 
formation theory jT2j H3] that was introduced in ref. [T2] 
and elaborated in ref. |14, by Shannon, and forms the the- 
oretical basis for the lossy compression of data sources. In 
the simplest scenario involving a continuous data source, 
the source generates an output modeled as a random vari- 
able U, and we wish to map U to another random variable 

V - one that perhaps presents lesser storage requirements 
- in such a way that a predefined distortion measure d 
such as the average mean-squared error between U and 

V is kept below a tolerable level d* . Roughly speak- 
ing, rate-distortion theory tells us how much information 
must remain in V in order to do so, and shows that coded 
schemes can achieve this compression limit. A fascinat- 
ing historical introduction into the theory and practice 
of lossy data compression may be found in ref. [15]. 

The development of quantum information theory [I6j 
[T7] in the past few decades has been much influenced by 
the ideas of classical information theory, including rate- 
distortion theory. One of the first results of quantum 
information theory, the noiseless coding theorem of Schu- 
macher [18] is a quantum version of Shannon's noiseless 



source coding theorem, which itself corresponds to the 
rate-distortion theory with allowed distortion set to zero. 
More recently [19], there have been efforts to formulate 
a quantum rate-distortion theory applicable to the lossy 
compression of quantum rather than classical informa- 
tion sources. 

In this paper, we apply classical rate-distortion theory 
to the problem of estimation of an optical phase using 
quantum probe states whose mean energy, i.e., photon 
number, is constrained. We adopt a Bayesian approach 
in which an arbitrary prior distribution of the phase is as- 
sumed and the squared error is averaged over this distri- 
bution. We first consider the ideal lossless case in which 
we obtain a rigorous lower bound for the mean-squared 
error that exhibits H limit scaling. This derivation ap- 
plies to any multimode probe state as long as the en- 
ergy in the modes sensing the phase is constrained to 
be at most N$. We then allow for the presence of loss, 
and show that, for any probe state with a single mode 
undergoing the phase shift, the mean-squared phase er- 
ror scales inversely as the mean number of photons in 
the probe state, i.e., it exhibits the "Standard Quantum 
Limit" (SQL) scaling of coherent states pQ. 



II. RATE-DISTORTION THEORY AND THE 
INFORMATION TRANSMISSION INEQUALITY 

Consider the random variable representing the phase 
shift to be sensed, and let its prior probability density be 
P$ ((/)). In rate-distortion theory, is viewed as a data 
source with differential entropy h(&) given by 

r2ir 

h(&) = - #P$(0)lnP$(0) (1) 
Jo 

and measured in nats/symbol. For another random vari- 
able 4> representing an estimate of <£, we can define the 
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mean- squared error (or MSE) as 



5<S> 2 := E 
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(2) 



where P^^((j)\(j)) is the conditional density of 4> given <£. 

The mean-squared error ([2| is an example of a dis- 
tortion measure, denoted d(<I>, 4>) in general, which is an 
ensemble average of a numerical function of the source 
output and the estimate that measures how far they are 
different for the purposes of a particular application. A 
variety of distortion measures may be used [13] , but only 
the mean-squared error distortion measure is considered 
in this paper. The rate- distortion function R(D) is de- 
fined as pa m 



R(D) 



inf 

(4>\cf)):d(<$>,<$>)<D 



/($;$), 



(3) 



where the quantity being minimized is the mutual infor- 
mation 4>) between the source and estimate and the 
infimum is over all conditional distributions P^((j)\(j)) 
that yield average distortion less than or equal to D. 
Note that R(D) depends on the prior distribution P®(<j)) 
and the distortion measure <i(<£, <£). 

The rate-distortion function R(D) may be thought of 
informally as the amount of non-redundant information 
per symbol emitted by the source, given that we allow 
for a distortion of up to D in a reconstructed version of 
the source. Examples of the computation of R(D) for 
some standard sources and distortion measures may be 
found in [El [14], although numerical evaluation is usually 
required for an arbitrary source. In general, the function 
R(D) and its inverse D(R) are decreasing and convex in 
their respective arguments. For the mean-squared error 
distortion measure, the following lower bound on R(D) 
may be used [14] [15] :- 



R(D)> -In 



D 



where 
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(4) 



(5) 



is the entropy power of [13 . Note that the bound is 
useful for D G (0, Q<$>] (outside which it can be replaced 
with zero) and is convex and decreasing on this interval. 
The following lower bound on D(R) follows from eq. Q:- 

-2R 



D(R) > e 



D(R)- 



(6) 



The operational significance of the rate-distortion 
function is elucidated by the positive and converse parts 
of the noisy source coding theorems of Shannon [14 . For 
our purpose of obtaining lower bounds on the achievable 
distortion, the converse part is of primary relevance. The 
fundamental result, called the Information Transmission 
Inequality in [15 , is stated below (Refer Fig. 1). 
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FIG. 1. Block diagram of the estimation scenario to which 
the Information Transmission Inequality applies. The blow- 
up shows how each of the parallel channels C is realized by a 
modulation of X into density operators of a quantum system 
followed by a POVM measurement on the system. 



Theorem 1 (Information Transmission Inequality - 
Theorem 1 of ref. [14 ]). Given k independent and identi- 
cally distributed (i.i.d.) source outputs Q> = 3>i, . . . , <$>k, 
each with prior distribution P$ ((/)). For a given distor- 
tion measure d(^>,4>) ; let the rate- distortion function of 
the source be R(D) nats/symbol. Given an encoder £ 
that maps <f> = 3>i, . . . , to an n-symbol-long codeword 
X = X\ , . . . , X n that is transmitted over a channel C with 
capacity C nats/use. Let the channel output codeword be 
Y = Yi , . . . , Y n which is mapped by a decoder V to an 
estimate <f> = 4>i, . . . , <$>k of <$>. Defining a per-symbol 



average distortion measure <i(3?, 4>) 
we have 



d(«M)>£>(£c), 



(7) 



where D(-) is the function inverse to R(D). 



Note that, unlike the positive part of the noisy source 
coding theorem which applies in the asymptotic limit of 
long codes with n —> oo, Theorem 1 applies to any given 
system of the form of Fig. 1. The application of The- 
orem 1 to obtain performance lower bounds in classical 
estimation and communication is well-known [20] . Its ap- 
plication to quantum metrology was first proposed in [10] 
and is made as follows. The key point is to implement the 
classical channel C appearing in Fig. 1 using a quantum 
system in the following way. Given a codeword symbol 
X, we implement a modulation map M that takes the 
symbol X and a given probe state p in the Hilbert space 
% of the quantum system of interest into another den- 
sity operator px £ We then make a measurement 
on px described by a Positive- Operator- Valued Measure 
(POVM) {Ily} [16] , whose outcome Y is the output code- 
word symbol (see inset to Fig. 1). Any such choice of 
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probe state, modulation map, and POVM measurement 
induces a probability transition matrix P Y \x(y\ x )i i- e -? a 
classical channel, for which a channel capacity C may be 
defined. This C may then be used in eq. ^ to yield a 
lower bound on the distortion. The calculation of C can 
be made to incorporate any constraints relevant to the 
sensing problem, e.g., an energy constraint on the probe 
state, a constraint on the kind of modulation allowed, or 
a constraint on the measurement POVM. 

In the context of quantum optics, we can consider 
a fixed class of probe states, e.g., coherent or quadra- 
ture squeezed states, certain kinds of modulation such as 
phase modulation or displacement in phase space, and 
standard measurements such as photon counting, homo- 
dyne or heterodyne detection. The channel capacities 
under a mean energy constraint under these probe, mod- 
ulation, and measurement choices are known in many 
cases [TOjUT]. Using this approach, performance bounds 
for the communication or sensing of a Gaussian source 
were obtained in [10]. In addition, for lossless estimation 
of a uniform phase parameter, a lower bound exhibiting 
standard quantum limit (SQL) scaling, i.e., the behavior 
S$> 2 ~ 1/Ns, was obtained for coherent state probes, and 
a lower bound exhibiting H limit scaling was obtained 
for a quadrature-squeezed-state (or two-photon coherent 
state (TCS)) probe. 

In this paper, we are concerned with lower bounds on 
the MSE for lossless and lossy phase estimation under a 
mean energy constraint N$ on the probe state p used to 
sense the phase. We will not consider coding over multi- 
ple instances of the phase, i.e., we set k = n — 1 in Fig. 1 
so that X = <I> = ( I> and Y = <I> = ( I>. This assumption 
of no coding is realistic in the single-parameter estima- 
tion problem considered here, though it may be relaxed 
in more general situations. In line with our earlier re- 
marks, a large part of our work consists in estimating 
the classical capacity of the channel resulting from phase 
modulation of the probe state according to the value of 

while allowing arbitrary POVM measurements on the 
modulated states. 



III. H LIMIT FOR LOSSLESS PHASE 
ESTIMATION 

We consider the following general strategy for estimat- 
ing a phase shift. An arbitrary pure probe state \*p) IS on 
the Hilbert space His of M 'signal' and M' 'idler' modes 
with average energy Ns in the signal modes is prepared 
[22] . The signal modes each undergo phase shifting by 
the unknown amount <p while the idler modes remain un- 
affected (see Fig. 2 for the M = M' = 1 case) so that we 
have the output state 



P* = V+p ls V\ (8) 
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FIG. 2. Schematic of a lossless or lossy phase sensing scenario. 
The phase shift <j> acts on the signal mode 'S'. In the lossless 
case, we have r\ — 1 and the idler mode T is not used. In the 
lossy case, an NDS state of the 'S' and T modes is prepared 
and a joint measurement is made to obtain the phase estimate 

I 

for 

^ = |0e^j (g)/j, (9) 

where {a m }^f =1 are the annihilation operators of the 
M signal modes and Ij is the identity operator on the 
M' idler modes. Finally, the optimum POVM is im- 
plemented on the joint state to yield the best MSE es- 
timate 4> of <£. The above description corresponds to 
an entanglement-assisted parallel strategy considered in 
ref. [23]. It is also a special case of the general image 
sensing framework of ref. [23] with the number of 'pixels' 
P set to one, the number of hypotheses M, identified as 
the 'number' of different phase-shift values, going to in- 
finity, the prior probability of the 'image' that shifts by <p 
set equal to P$ (</>), and the cost function taken to be the 
mean-squared error. We will apply the results of ref. [23] 
extensively in the following. 

In this section, we consider the lossless case in which 
the beam splitter of Fig. 2 has transmittance 77 = 1. It 
was shown in ref. [23] (see the section on lossless im- 
age sensing) that both multiple modes and idler entan- 
glement are unnecessary for optimal performance in any 
image sensing problem in the above framework. That is, 
a single-mode signal-only probe state is sufficient so that 
we may set M = 1 and M' = 0. 

The unrestricted capacity C(Ns) (in nats/use) of a 
single-mode noiseless channel under a mean energy con- 
straint on the channel input ensemble is well-known [24] 
and is given by 

C(N S ) = (N s + 1) In (Ns + 1)-N S In N s . (10) 

By 'unrestricted', we mean that there is no constraint 
on either the POVM or the modulation map other than 
that the output ensemble of the modulation map has 
mean energy Ns. This energy constraint is satisfied 
by the ensemble {P$>(0), p^} - the {p^} all have the 
same energy, namely Ns, as the probe pis - so that 
C(Ns) > Cph(Afs-), where the latter capacity is that 
achieved by the phase- modulated ensemble {P$ ((/>), p^} 
optimized over all POVMs. Since D(-) of eq. is a de- 
creasing function of its argument, we may apply u?\ to 
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get (recall that k = n = 1) 

<5$ 2 > D(C phase (N s )) > D(C(N S )) 



> D(C(N S )) = Q* 1 



-2JV S 



> 



1 



e 2 (iV s + l) 2 ' 



(iV s + I) 2 

(11) 



where we have used the lower bound d6|). 

Eqn. ( pTj ) has the form of a H limit [5H2] with asymp- 
totic behavior ~ 1/^J- However, as the derivation 
shows, it is a non-asymptotic result valid for arbitrary 
prior distribution P$ ((/>), arbitrary probe state, arbitrary 
POVMs, and for all values of N s . 

It is interesting to compare the bound ( pT| ) to that 
obtained by Hall and Wiseman in ref. [9 . Eq7(17) of [9] 
reads (after removing the factor of 2 multiplying Ns as 
explained in [9]) 



i 



2^ P 2 ax (iV s + l) 2 



(12) 



where P ma x > 1/2ty is the maximum value of P$ ((/)). 
If P$>{4>) is sharply peaked at some point of [0, 2tt) or 
not bounded from above (as when a particular <\> has a 
nonzero probability of occurrence), the right-hand side 
of (12) can be smaller than ( pTj ), which remains nonzero 
as long as P$(4>) is supported on some interval of finite 
length. On the other hand, for <I> distributed uniformly 
in an interval of length L, the bound Eq. (12) coincides 
with eq. ( 11 ), as may be verified by setting Q<$> = L 2 /2ne 



in ( 11 ). This is rather remarkable as the two results were 



obtained using ostensibly quite different methods. 

We may wonder if the bound ( 11 ) may be strenghtened 



by using the capacity C p h(Ns) of the lossless channel re- 
stricted to phase modulation rather than the unrestricted 
capacity C{Ns). Indeed, the ensembles well-known to 
achieve C(N$) are the number states with a thermal 
distribution [24] and coherent states with a circularly- 
symmetric Gaussian distribution on phase space [25] . 
However, for the case of uniform P$ (</>), it has been re- 
cently shown that C p h(Ns) can be made to approach 
C(Ns) arbitrarily closely using phase modulation on an 
appropriate probe state [26] . 

In sum, while the bounds in refs. [5H9] are applicable 
to more general Hamiltonians than the linear phase shift 
Hamiltonian considered here, we have presented a new 
derivation of a non-asymptotic H limit for linear optical 
phase estimation that is satisfied by every probe state 
and can deal simply with arbitrary prior information. 



IV. LOWER BOUND ON THE MSE IN LOSSY 
PHASE ESTIMATION 

We now show that the above technique can be ex- 
tended to phase estimation in the presence of loss. Con- 
sider again the setup of Fig. 2 with nonzero loss so that 



the beam splitter has transmittance rj < 1 [27]. It fol- 
lows from Theorem 1 of ref. [23] that, among all probe 
states \ip) IS with a given photon probability distribution 
{p n }, n G {0, 1, . . .} M in the M signal modes, a state of 
the form 



\*)l S = 



l*n)/|n) s 



(13) 



minimizes the MSE, where {|^n)j} is any orthonormal 
set of idler states - such states are called NDS (Number- 
Diagonal Signal) states. Therefore, a lower bound on the 
phase estimation MSE valid for any NDS probe state of 
M signal modes with mean signal energy Ns is also a 
lower bound on any M signal-mode state (with any kind 
of entanglement with idler modes) of mean signal energy 
Ns [22]. For lossy phase estimation, unlike the lossless 
case of Section III, using a probe with multiple signal 
modes may decrease the MSE from the single mode case 
for the same total signal energy Ns. However, we will 
consider just the M = 1 case of a single signal-idler pair 
in the following. 

Consider using the probe state \ip) IS of Eq. (13) with 



M = 1 so that n = n, an integer index. Under phase 
modulation with prior density P$ ((/)), we obtain the out- 
put ensemble {P$ (0), p^} given via eq. (J8|, where pis 
may be written as 



Pis = ^Qi \xi) is (Xi 



(14) 



where q\ is the probability that I photons are lost to 
the environment during the beam-splitter interaction of 
Fig. 2 and is given by 



Qi 



= ^p n P,(n, Z) = 5>n ( 7 ) V n ~ l (1 - V) 1 - (15) 



i>i 



The states {\xi)is} m e( l- (14) are given by 



and form an orthonormal set, i.e., is(xi\xi')is — 

by virtue of the fact that the {1^^} are orthonormal. 

Thus, pcf) of Fig. 2 is given by 



P4> 



^Qi \XiW)is(Xi(^)\ 



(17) 



W = ~1= E JpnB v (n,l) \9 n )j \n - l) s . (18) 

n>l 



where 
Xl( 



We adopt the following strategy for obtaining a lower 
bound on the MSE in lossy phase estimation for an NDS 
input. We will first obtain an upper bound C p h(Ns) 
on the capacity of a lossy phase modulation channel re- 
stricted to an NDS input state \i/j) is of mean signal en- 
ergy Ns- Since the lower bound Eq. (p| is decreasing 
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in R, the Information Transmission Inequality gives the 
lower bound D (C p h(Ns)) on the MSE. Finally, since an 
M = 1 NDS state minimizes the MSE among all states 
with M = 1, the previous bound is also valid for all 
M = 1 states with mean signal energy N$. 

In order to estimate C p h(Ns)i we use the Holevo bound 
[16J El [28] which states 



C ph (N s ) < S(J>) 



2tt 



S(p) - S(pis) 



>P*(0)S(p ), (19) 
(20) 



where 



p2tt 



)P*(<A) /90 



(21) 



is defined to be the average output density operator, and 
S(-) denotes von Neumann entropy. Defining the uniform 
signal-phase randomization CP map V on His as 



Va 



(22) 



we have S(p) < S(Vp) because V is a unital CP map [16 . 
It is readily verified that Vp is independent of P®(4>) and 
equals 

oo oo 



;=0 n=J 



(23) 



The orthogonality of {|^ n ) 7 } then implies that 

S(Pp)=H(N,N-L), (24) 

where N and N — L are the classical random variables 
corresponding to a measurement on Vp of the {|^ n ) 7 } 
basis on the idler mode and the photon number in the sig- 
nal mode respectively, and H(-) is the Shannon entropy. 
Similarly, the orthogonality of {\xi)is} implies that 

S(p IS ) = H(L). (25) 

Combining the above facts, we have 

C ph (N s )<S(Vp)-S(p IS ) (26) 
= H(N,N -L)-H(L) 
= H(N,L)-H(L) 
= H(L\N) - [H(L) - H(N)} 

= ^PnH(L\N = n)-[H(L)-H(N)} (27) 



- [H(L) - H(N)] 
< i ln 27re ^ (1 - 77) N s + ^ 

- [H(L) - H(N)] . 
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(28) 



(29) 



s 1_r l 



FIG. 3. The single-mode input state and loss channel C for 
which the entropy gain from input to output equals H(L) — 
H(N) of Eq. (29). 



Here we have used standard entropy manipulations to 
obtain eq. (27). To obtain (28), we have applied the 
bound 



H{X)< - In 



27re VarX 



1 

12 



(30) 



on the Shannon entropy H(X) of a discrete random vari- 
able X in terms of its variance (see Problem 8.7, p. 258 
of p3|), substituting the variance of L conditioned on 
N = n (L has a binomial distribution in this case). Fi- 
nally, eq. (29) follows from concavity of the logarithm. 



We now bound the second term in eq. (29). Con- 



sider the single-mode pure loss channel C depicted in 
Fig. 3. It is readily verified that, for the input state 
Pi n = ^2 n p n \ n )s( n \i the channel outputs the state 
Pout = ^2iQi \l)s ('I so th a ^ the entropy gain from input 
to output is precisely H(L) — H(N). For the channel C 
of Fig. 3, Holevo has shown (See Theorem 2 of [29 ) that 
the minimum entropy gain 

inf [S(£p in )-S(p in )}=ln(l- V ), (31) 

pinens 

where the infimum is over all input states in His an d 
therefore includes the input state of Fig. 3. We thus 
have, for all probe states \^) IS , 



In (1-7?) <H(L)-H(N), 



(32) 



which, combined with (29), gives 



Cp h (iVs) <^ln 



2?re 



1(1 -r,) 2 
C ph (N s ), 



7] (i - n) n s + 



(33) 



which is the sought upper bound on C p h(Ns). 

The final step is to use the lower bound ([6| and the In- 
formation Transmission Inequality to get the lower bound 



5$ 2 > D(C ph (N s )) 



2ne [r] (1 - t?) N s + ^] 



(34) 



on the MSE that exhibits SQL scaling <5$ 2 ~ 1/N S for 
large Ns, implying that for any nonzero amount of loss, 
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nonclassical states of light are not much superior to co- 
herent states for phase sensing, at least when M = 1. 
Note that the bound (34) does not reduce to Eq. (11) as 



r] —> 1 because the estimate ( 32 ) is not tight in that limit 



[30] . The following bound applicable to lossy phase esti- 
mation that is in terms of the mean and variance of the 
input state and exhibits SQL scaling has been obtained 
in [31]:- 



7] 



1 



4 TVs 4(A7V|) 



(35) 



The left-hand side is the mean-squared error 

achieved for a particular (but arbitrary) value of <\> and 
(ATVj) is the variance of the signal photon number. This 
bound, based as it is on the quantum Cramer-Rao in- 
equality, is valid for every value of <j) provided the esti- 
mate 4> is unbiased. Note that, in the region of large 
(ATVj), which is achievable for any finite Ns [32 , the 
bounds ([35]) and (34) are rather similar in form. 



V. CONCLUSION 

We have developed the rate-distortion approach to 
lower bounds on the MSE of optical phase estimation. 



We have obtained an H limit for lossless phase estima- 
tion and a lower bound with SQL scaling in the presence 
of loss. It is hoped that the approach of this work can be 
extended to obtain performance bounds on the estima- 
tion of other system parameters using quantum states of 
light. 
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